[Tests] Diverse Whisper fixes by ylacombe · Pull Request #33665 · huggingface/transformers

ylacombe · 2024-09-23T16:01:31Z

What does this PR do?

There's a lot of pending failing tests with Whisper. This PR addresses some issues:

Whisper - list index out of range with word level timestamps #31683 and Whisper fix audio out of range #31770 mentioned a out-of-range word level timestamps. This happens because decoder_inputs_ids were once forced_input_ids. This had an impact on the beam_indices.

beam_indices has a length of decoder_input_ids + potentially_generated_ids but doesn't take into account decoder_input_ids when keeping track of the indices. In other words beam_indices[0] is really the beam indice of the first generated token, instead of decoder_input_ids[0].

The Flash-Attention 2 attention mask was causing an issue
The remaining work is done on the modeling tests. Note that some of these tests were failing because of straightforward reasons - e.g the output was a dict - and are actually still failing, but their reasons for failing are not straightforward anymore. Debugging will be easier though.

Note: With #33450 and this, we're down from 29 failing tests to 17

HuggingFaceDocBuilderDev · 2024-09-23T16:50:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik · 2024-09-27T08:34:47Z

LMK when you want me to review @ylacombe (please fix the tests and the conflicts first 😊)

ylacombe · 2024-10-02T14:33:35Z

Hey @LysandreJik, could you review now ? It seems that the failing check is unrelated to Whisper !

LysandreJik

I'm not totally up to date with the changes in generation_whisper so I'll trust you with them; if you want a double check feel free to cc @ArthurZucker but if you're confident you can ahead and merge.

ylacombe · 2024-10-03T09:54:50Z

cc @ArthurZucker for a quick look !

ArthurZucker

LGTM, let's just run slow before we merge? 🤗

ylacombe · 2024-10-03T12:28:17Z

cc @ydshieh , there's still this error when doing slow tests:

File "/usr/lib/python3.8/runpy.py", line 194 in _run_module_as_main
Bus error (core dumped)

ydshieh · 2024-10-03T12:38:59Z

hi @ylacombe let me merge #33849 and try to rebase this PR on top of main and let's see.

ydshieh · 2024-10-03T12:40:43Z

Could you rebase and trigger again with [run-slow] whisper?

ylacombe · 2024-10-03T13:14:57Z

It's working thanks !

ylacombe · 2024-10-03T13:58:57Z

The list of failing tests are as expected, merging !

leng-yue · 2024-10-04T23:41:05Z

Great work! I noticed there's a small issue in this PR, especially when prompt_ids is specified—the model always returns a timestamp of 0.0.

To fix this quickly, you can update

transformers/src/transformers/models/whisper/generation_whisper.py

Line 191 in 38f9f10

    
           cross_attentions.append(torch.cat([x[i] for x in generate_outputs.cross_attentions], dim=2))

to:

cross_attentions.append(torch.cat([x[i] for x in generate_outputs.cross_attentions], dim=2)[:, :, num_input_ids-3:, :])

ylacombe · 2024-10-07T15:55:55Z

Hey @leng-yue, thanks for your message!
While there's maybe something wrong with how cross-attentions weights are computed, I'm not sure that your proposed solution works.
Could you provide a code snippet and maybe an explanation of the proposed solution if you have time?
Thanks!

leng-yue · 2024-10-08T22:26:50Z

Basically num_input_ids is prefix_tokens (timestamps, language, etc) + prompt_ids, therefore, we want to strip theprompt_ids, which is num_input_ids - 3.

* fix beam indices in token_timestamps * fix attention_mask in FA2 * correct translation example with the right example * correct how somes tests are using outputs + correct num_frames * fix shortform batch prev cond tests * make fix-copies * make fix-copies * take care of shifting beam indices * [run-slow] whisper * [run-slow] whisper

ylacombe added 6 commits September 23, 2024 14:32

fix beam indices in token_timestamps

7ed7e03

fix attention_mask in FA2

792039a

correct translation example with the right example

df42ba0

correct how somes tests are using outputs + correct num_frames

d1ad495

fix shortform batch prev cond tests

bd576e7

make fix-copies

4ecc12a

ylacombe and others added 3 commits October 2, 2024 15:47

Merge branch 'main' into diverse-whisper-fixes

5e3174b

make fix-copies

f32860a

take care of shifting beam indices

fbce286

LysandreJik approved these changes Oct 3, 2024

View reviewed changes

Merge branch 'huggingface:main' into diverse-whisper-fixes

50505f2

ylacombe requested a review from ArthurZucker October 3, 2024 09:55

ArthurZucker reviewed Oct 3, 2024

View reviewed changes

ylacombe added the run-slow label Oct 3, 2024

[run-slow] whisper

72a141d

ylacombe and others added 2 commits October 3, 2024 14:43

Merge branch 'huggingface:main' into diverse-whisper-fixes

d9b4556

[run-slow] whisper

1ac03b2

ylacombe merged commit bf0ffe3 into huggingface:main Oct 3, 2024

gante mentioned this pull request May 21, 2025

[Whisper + beam search] fix usage of beam_indices #38259

Merged

Conversation

ylacombe commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 23, 2024

Uh oh!

LysandreJik commented Sep 27, 2024

Uh oh!

ylacombe commented Oct 2, 2024

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

ylacombe commented Oct 3, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ylacombe commented Oct 3, 2024

Uh oh!

ydshieh commented Oct 3, 2024

Uh oh!

ydshieh commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ylacombe commented Oct 3, 2024

Uh oh!

ylacombe commented Oct 3, 2024

Uh oh!

leng-yue commented Oct 4, 2024

Uh oh!

ylacombe commented Oct 7, 2024

Uh oh!

leng-yue commented Oct 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ylacombe commented Sep 23, 2024 •

edited

Loading

ydshieh commented Oct 3, 2024 •

edited

Loading